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ABSTRACT 


Data clustering is popular data analysis approaches, which used to organizing 
data into sensible clusters based on similarity measure, where data within a 
cluster are similar to each other but dissimilar to that of another cluster. In 
the recently, the cluster problem has been proven as NP-hard problem, thus, 
it can be solved with meta-heuristic algorithms, such as the particle swarm 
optimization (PSO), genetic algorithm (GA), and ant colony optimization 
(ACO), respectively. This paper proposes an algorithm called Fast Ant 
Colony Optimization for Clustering (FACOC) to reduce the computation 
time of Ant Colony Optimization (ACO) in clustering problem. FACOC is 
developed by the motivation that a redundant computation is occurred in 
ACO for clustering. This redundant computation can be cut in order to 
reduce the computation time of ACO for clustering. The proposed FACOC 
algorithm was verified on 5 well-known benchmarks. Experimental result 
shows that by cutting this redundant computation, the computation time can 


be reduced about 28% while only suffering a small quality degradation. 
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1, INTRODUCTION 

Clustering is a problem which goal is to find hidden structure behind dataset. It is done by grouping 
the data into several cluster by the means of their similarity. Clustering falls into the category of optimization 
problem as it needs to estimate the optimal position of an unknown cluster center. Unlike other optimization 
problem, clustering plays a wide role in the recent development of computer science [1]. It has been found 
capable to be applied in various field: data mining [2-5], image processing [6-8], geographical information 
system [9-12], computational biology [13, 14], road sccidents analysis[15], routing protocol [16], and power 
consumption [17], respectively. Because of its significance, a lot of researches has been conducted in order to 
improve the performance of clustering algorithms. 

One of the performance indicator that needs to be improved in clustering algorithm is its 
computation time. As we enter the era of big data, the amount of retrieved data is growing massively. 
Analyzing those data with traditional clustering algorithm may not feasible in considerable amount of time. 
Thus, it is essential to improve a clustering algorithm so that it is time-efficient. Because of its importance, 
several researches has been conducted to improve computation time in various clustering algorithm [18-25]. 

The focus of this paper is on improving the computation time of Ant Colony Optimization (ACO) 
for clus-tering. Compared to other traditional clustering algorithm such as Simulated Annealing, Genetic 
Algorithm and Tabu search, ACO has been found to have a better result in quality [26]; thus the 
improvement will create a clustering algorithm with high results quality and reasonable computation time. 
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The strategy for reducing the computation time employed is pattern reduction. Pattern reduction seeks for 
redundant process in the algorithm and remove it to de-crease computation time. Pattern reduction 1s adopted 
because in ACO for clustering, redundancies in generating new solutions are observed. In the rest of this 
paper, the proposed algorithm will be called as Fast Ant Colony Optimization for Clustering (FACOC) for 
simplicity. The remainder of this paper is organized as follows. Section 2 provides the background 
information. The proposed FACOC algorithm is presented and evaluated in Sections 3 and 4, respectively. 
Brief conclusions are presented in Section 5. 


2. RESEARCH METHOD 
2.1. Ant Colony Optimization 

Ant colony optimization (ACO) is a meta-heuristic optimization algorithm that inspired by the 
behavior of real ants [27]. ACO originally developed as an algorithm to solve travelling salesman problem, 
introduced as Ant System (AS) [28]. In 1997, the ant system improved to control its exploration and 
exploitation, introduced as Ant Colony System [29]. The basic ACO algorithm is shown in algorithm 1. 


Algorithm | Pseudocode for ACO algorithm 
initialize 

While stopping condition is not met 
generate new solutions 

update pheromone 

local search 

End While 


ACO initializes all of the initial parameter such as initial pheromone (tO) and evaporation rate (r). 
The main body of the algorithm is repeated until stopping condition is met, shown in the 3rd to 5th line in 
algorithm 1. Then, the algorithm repeated its main body until stopping condition is met. 

The first step of the main body is to generate new solutions. In generating new solution, ACO 
calculates the probability of each sub-solution based on the total pheromone that has been laid in it. For TSP, 
the probability is calculated as shown in Equation (1) 


Pi die allowed) [tj] [ni]P (1) 


In Equation (1) 1;; is the pheromone value for edge between city i and j. 1; ; is the inverse of the 
distance between city i and j. allowed; is the possible sub-solution that allowed to be used according to 
taboo list. 

After generating new solutions, the pheromone table is updated based on the quality of the newly 
generated solutions. The pheromone update process follows the calculation shown in Equation (2). 


q(t) =(1—p)tj+p Y Ar. 
k=] (2) 


(3) 


At k in Equation 2 is calculated as Equation 3; where Lk can be one of the follows: length of the 
tour of kt h ant, length of the best tour in current iteration, or length of the best solution acquired so far. In 
ACS, Lk is the length of the best tour at the start of the algorithm. 

The last step in the main body is local search. Local search is used to improve the quality of the 
solution that has been acquired. In the case of TSP, the local search step can be employed from the popular 
local search for TSP, such as 2-opt, 3-opt, and Lin-Kernighan. 
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2.2. ACO for Clustering 

In 2004, Shelokar et al [26] introduced the use of ACO for clustering problem. The idea is to 
encode the solution for clustering problem into a string representation; each string element represent the 
sample data and its content represent the cluster number which the sample data assigned to. Each ants in the 
ACO then build a solution based on the string representation. Table | shows the matrix representation of the 
solution strings for dataset that has N = 8 sample data and K = 3 cluster numbers, using M = 5 ants. 


Table 1. Matrix representation example for solution strings in ACO for clustering 


= = 
"12345678 
Ss 2 i223 to 
SS 1321123 3 
S 12333122 
S&S 31131213 
=< 94633 3 2 1 


The general step of the ACO for clustering follows the same step as shown in algorithm 1. The first 
step in main body is to generate new solutions. In ACO for clustering, an ants can choose a new cluster 
number in two ways, based on a pre-determined value qO. The first way is to choose the cluster having 
maximum amount of pheromone among other cluster number in the same data sample. The illustration of the 
matrix representation of the pheromones is given in Table 2 and Table 3. Second way is to choose the cluster 
number with probability given in Equation (2), where pi j is the probability of cluster number j to be chosen 
in sample data i. The first way is analogical to the exploitation process used in ACS [29]. On the other hand, 
the second way is analogical to biased exploration in ACS. In this step, each ants update the pheromone 
matrix by using Equation (2). 

The second step in the main body is updating the pheromone trail. In this step, only the best ants 
update the pheromone matrix. The update process is similar to the pheromone update process in the first step 
except that it use different evaporation rate a shown in Equation (5). The value of Ati j is also calculated 
similarly, as shown in Equation (3). Here Fk is the quality of the kt h solution. 


eek 
f= i jal. 


7 YE Tj 
rj(t) =(1—a)tj; toe Ark. 
k=] 
' (5) 
Ar. = —. 
I Fé 6) 


The pheromone is stored in the matrix as shown in Table 2. In order to use the pheromone in 
Equation (5), the pheromones must be normalized so that for each sample data, the pheromone 
concentration is summed to 1. Table 3 show the example of normalized pheromone matrix. 


Table 2. Example of pheromone matrix in ACO for clustering 


es Clusters 
ample 5 5 


cos OA uA ee Ww hoe 
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0.014756 
0.015274 
0.015274 
0.009900 
0.014756 
0.009900 
0.009900 
0.015274 


0.015274 
0.009900 
0.014756 
0.015274 
0.015274 
0.014756 
0.020131 
0.014756 


0.009900 
0.014736 
0.009900 
0.014756 
0.015274 
0.009900 
0.009900 
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The third step in the main body is to perform local search. Local search is actually a specific 
technique applicable only for TSP to improve the quality of solutions. To adapt the local search paradigm, 
ACO for clustering use operator similar to mutation operator in Genetic Algorithm. This local search step 1s 
applied only to 20% of the total ants with best solution quality. This local search step starts by generating 
random number for each sample data in a solution. If there are sample data with random number below a pre- 
determined threshold value p;,;, then its cluster number must be changed to different cluster number. 


Table 3. Example of normalized pheromone matrix in ACO for clustering 


Sample Data ———__"——<—-— == 5 
l 0.3695 O.3825 0.2479 
2 0.3825 0.2479 0.3695 
3 O3825 0.3695 0.2479 
4 0.2479 O3825 0.35695 
5 0.3695 0.3825 0.2479 
6 O2479 0.3695 0.3825 
7 0.2479 0.5041 0.2479 
8 O3825 0.3695 0.2479 


3. PROPOSED ALGORITHM 
3.1. The Concept 

FACOC improves the ACO computation time by using pattern reduction. Pattern reduction seeks 
for redun- dant computations that usually occurred in heuristic-based optimization algorithm and cut it to 
reduce the computation time [30-32]. Pattern reduction reduces the computation time significantly but 
resulting a little degradation in the solution quality. 

In the case of ACO for clustering problem [26], similar redundant computation to what Tseng et al 
found [32] is observed. As the iteration grows, the pheromone of sample data for certain cluster number 
become more and more intensive. This fact drives ants to choose the same cluster number in almost 
certain probability and thus resulting a redundancy in computation. The redundancy even grows more and 
more massive as the algorithm reach its convergence. Cutting this redundancy will reduce the computation 
time significantly. 


3.2. Algorithm of FACOC 

In order to cut the redundancy, the algorithm needs to detect if redundant repetition occurred. It is 
done by keeping track each time a cluster number is chosen for each sample data. The tracker is recorded in 
a matrix v with dimension N x K, where N is the number of sample data and K is the total number of 
clusters. Table 4 shows the example of initial state of matrix v for dataset that has 8 sample data and 3 
cluster numbers. 

Each time a cluster number is chosen for each sample data, the corresponding value in matrix v is 
incremented by |. For example, if an ant choose 1 as cluster number for the second sample data, the value 
of matrix v in row 2 column | is incremented by 1. 


Table 4. Example of initial state of matrix v 


Clusters 
2 
0 
0 
0 
0 
0 
0 
0 
0 


Sample Data 


GOs la he 
Sao oq oooc a= 
Seoocoecece ci 


To cut redundant repetition, a threshold value Y is introduced at the start of algorithm. The value of 
YY is set to JMX . IT is the number of iteration considered before a cluster number become common. The 
parameter / determines the iteration point when cluster numbers is starting to become common. M is the 
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total number of ants. A is the value that control the percentage value of ‘¥ to be used. Whenever a value 
in matrix v exceed Y, the corresponding cluster number become common cluster number for the related 
sample data. The existence of common cluster number recorded to vector ¢ , which size is equals to N. The 
corresponding value of ¢ is updated to 1 whenever a cluster number become common and 0 elsewhere. For 
each sample data that corresponding ¢ value is 1, all ants are forced to choose the common cluster number, 
skipping any probability calculation. This modified cluster number choosing process cuts redundant 
probability calculation and thus decreasing the computation time. The modified cluster number choosing 
process is explained by Equation (7); where Coommonn 18 Cluster number that already become common on nth 
sample data; Cyaxp j»n 1S cluster number that has the most pheromone in n’ ' sample data. Beside the cluster 
number choosing process, the local search process is also affected by vector C. If the value of ¢ for a sample 
data is 1, it will never be affected by local search process anymore. The complete process of FACOC is 
explained compactly in algorithm 2. 





| eee | ij ‘Gn a | 
p i; = 4 COmaxp it i if g = 40 

Cin with probability =““— otherwise 

: LE j kn 

(7) 
Algorithm 2 PseudocodeforFACOCalgorithn 
while iteration < max iteration do 
update matrix v 


check if a cluster number become common 
generate new solutions 
update pheromone 
local search 
end while 


4. PERFORMANCE ANALYSIS 
4.1. Research Environment 

The research is conducted on a computer with Intel Core 17 processor and 4GB of random access 
memory (RAM). The operating system used is Windows 7 and the program is built using MATLAB 
programming language. 

The datasets used in this research obtained from UCI machine learning website [33]. Five datasets 
taken from UCI machine learning website to be used in this research: Iris [34], Wine [35], Parkinsons[36], 
Connectionist Bench (Sonar, Mines versus Rocks) [37], and Haberman’s Survival [38]. The description of the 
datasets are shown in Table 5. 


Table 5. Description of the datasets 


Dataset Number of Pattern Numberof Clusters § Number of Attributes 
[ris 150 3 4 
Wine 7s 3 13 
Parkinsons 195 2 22 
Connectionist Bench 208 2 60 
(Sonar, Mines versus Rocks) 
Haberman’s Survival 306 2 3 


4.2. Parameter Initialization 

For this experiment, the initialization for parameter used in ACO for clustering and FACOC is 
shown in Table 6. Note that parameter 4 is exclusive to FACOC. This parameter values are chosen because 
they give optimum result to the algorithm. 
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As for parameter I and maximum iteration, it determined differently for each datasets. The 
maximum iter ation is determined according to the number of iteration when the basic ACO usually obtain 
optimal solution. The parameter / is set around 20% of the maximum iteration. The experiment shows that 
set J to the number much less than 20% significantly decrease the quality of solution. In contrast, set it 
much more than 20% makes the decreasing in computation time much less significant. The maximum 
iteration and value of J 1s shown in Table 7. 


4.3. Performance Evaluation 

In this research, sum squared error formula is used to evaluate the quality of solutions. The sum 
squared error formula is shown in Equation (8), where x;, is the attribute value of v dimension of i/” sample 
data; mj, is the v h dimension value of the centroid for j’” cluster; w; ; 18 weight that shows if the 7’ h sample 
data belongs to j’” cluster; the value of w;; is 1 if i” sample data belongs to j“" cluster and 0 otherwise. 


Table 6. Initialization of parameter for ACO for clustering and FACOC 
Parameter Value 
Number of Ants (M) 25 
Local evaporation rate (p) 0.9 
Global evaporation rate (p) 0.9 


go 0.98 
PI p. Ol 
A l 


Table 7. Initialization of parameter J and maximum iteration for raw dataset 


Dataset I Max Iteration 
Iris 120 600) 
Wine 210 BO) 
Parkinsons 80 400) 
Connectionist Bench (Sonar, Mines versus Rocks) 80 400 
Haberman’s Survival 100 500) 


| . K No on 
F(w,m) = 2. | » Ww; ;||Xiy — Mjy | 
j=l i=1 v=! 


| (8) 


This research focus on decreasing the computation time taken by ACO for clustering in 
significant value. Beside that, the quality of the solution produced as shown in Equation (8) also must below 
the tolerance value of 5%. Thus it resulting a system with significant decreasing of computation time and 
tolerable quality degradation. 


4.4. Initial Experimental Analysis 

To confirm that FACOC is actually reduce the computation time, initial experiment on it is 
conducted to analyze it. In the initial experiment, FACOC and basic ACO for clustering are run and their 
computation time taken over iteration are plotted together. The plot can be seen in Figure 1. The red line 
shows the computation time taken for FACOC while the blue line belong to basic ACO for clustering. For this 
initial experiment, the dataset used 1s Iris with parameter J = 120 and maximum iteration is 600. 

It can be seen from Figure 1 that at first, the computation time taken for FACOC grows larger 
than the basic ACO for clustering. This behavior has been expected as FACOC inject additional command in 
the program. However, an interesting behavior happen when the cluster number that become common is 
increasing. This behavior starts from about 120" iteration, which is the value of parameter J. Starting from 
that point, the computation time of FACOC growing significantly slower than basic ACO for clustering. At 
the 280" iteration, the FACOC even catch up the basic ACO for clustering and keep growing slower. As the 
result, starting from that point, the computation time taken for FACOC is faster than basic ACO for 
clustering. 
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4.5. Experimental Result 

The experimental result is shown in Table 8. From Table 8, it can be seen that across different 
dataset, the computation time of FACOC is decreased by significant differences of 27.586% in average. On 
the other hand, the quality is decreased only in small percentage of 3.0352% in average. 


Table 8. Experimental result on raw datasets 


Dataset I Max Iteration 
Iris 120 600 
Wine 210 800 
Parkinsons SO 400) 
Connectionist Bench (Sonar, Mines versus Rocks) 80 AW) 
Haberman’s Survival 100 S00) 


The result shown in Table 8 use the raw data given in each dataset. This approach may be biased 
because the scale of the value vary between the datasets. To overcome this problem, the performances 
over normalized dataset is also recorded. The normalized dataset values is calculated as shown in 
Equation (9); all of them scaled between | and 0. In Equation (9), argmin{x(i)} and argmax{x(i)} 
consecutively represents the minimum and maximum attribute value for i” dimension. Because the 
characteristic of the dataset after normalization is different from the raw dataset, the number of maximum 
iteration is also determined differently. The value of parameter J and the maximum iteration for normalized 
dataset is shown in Table 9. As for the result, it is recorded as seen in table 10. From Table 10, it could 
be seen that the scale indeed affect the result. Despite of that, FACOC still managed to output significant 
differences in computation time by -28.595% in average. The degradation of the quality is still retained 
in small value also by 1.6569% in average. 


Table 8. Experimental result on raw datasets 


Dataset I Max Iteration 
Iris 120 600 
Wine LOO 500 
Parkinsons 80 400) 
Connectionist Bench (Sonar, Mines versus Rocks) 800 160 
Haberman’s Survival 200 LOO0 


The result shown in Table 8 use the raw data given in each dataset. This approach may be biased 
because the scale of the value vary between the datasets. To overcome this problem, the performances 
over normalized dataset is also recorded. The normalized dataset values is calculated as shown in 
Equation (9); all of them scaled between 1 and 0. In Equation (9), argmin{x(i)} and argmax{x(i)} 
consecutively represents the minimum and maximum attribute value for i” dimension. Because the 
characteristic of the dataset after normalization is different from the raw dataset, the number of maximum 
iteration is also determined differently. The value of parameter J and the maximum iteration for normalized 
dataset is shown in Table 9. As for the result, it is recorded as seen in Table 10. From Ttable 10, it 
could be seen that the scale indeed affect the result. Despite of that, FACOC still managed to output 
significant differences in computation time by -28.595% in average. The degradation of the quality is still 
retained in small value also by 1.6569% zn average. 


_ X;, — aremin{ x(i)} 
* argmax{x(i)} — argmin{x(i)} 


(9) 
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Table 9. Initialization of parameter 7 and maximum iteration for normalized dataset 








Dctewt tor Clustenng “ALLE Dn there nce: 
Ave. Time Awe. Quality Ave. Time Avg. Quality Time Quality 
ns 1s BiyeD) UO. PG I 166 -33, 159 or 
Wine: 0.943 49.0618 0.666 50.0867 -30.144% 2.09 
Parkinsons 1.685 O8. 8003 0.508 100.3685 -35.8195 1.59% 
Connectonist Bench 1.462 446.1638 1.1 446 7613 -24 804, 0.13% 

(Sonar, Mines versus Rocks) 

Haberman’s Survival 2.156 25.341 1.528 25.8473 -29.1045 2008 





Average ~ 2. Be 1.66 Se 
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Figure 1. Plot of time vs iteration for FACOC and ACO for clustering algorithm 
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